Overview

  • Hour 1
    • JSON
    • Regular Expressions
  • Hour 2
    • Q&A
  • Hour 3
    • Work Time

JSON Format

Strings can be further arranged, so they can encode more complex information. For example, CSV (Comma-Separated Values) can be used to share spreadsheet data between applications.

JSON (Javascript Object Notation) is used to transmit ordered lists (arrays) and name- value pairs (objects). It can be sent in a file or through a web request.

Syntax

The syntax for a JSON array is similar to the syntax for a Python list.

["Monday",  6, "pumpkin", 3.1415]

The quotation marks around strings are mandatory.

The syntax for a JSON objecgts is similar to the syntax for Python dictionaries. ```` { "latitude_degree": 43.6617, "latitude_direction": "N", "longitude_degree": 79.3950, "longitude_direction": "W" }


Arrays and objects can appear as values within arrays and objects. Here is an example of a complex JSON string, formatted for readability by humans.

[ { "customer": { "last": "Ball", "first": "Maldonado" }, "rating": 2, "specialOrder": true, "tags": [ "ullamco", "aute", "mollit", "ex" ], "greeting": "Happy Birthday", "filling": "chocolate", "batter": "vanilla", "frosting": "chocolate" } ] ```

As you can see, it’s a customer dictionary with a nested dictionary for the address and a list of dictionaries for telephone numbers. The formatting is similar to Python.

An useful tool when working with JSON is a viewer. JSON that is being transmitted usually has all of the whitespace removed. It makes the total transmission smaller, but difficult for humans to read. It’s common to use a JSON viewer, in your IDE or on a web page. You can find one by Googling.

Another common tool is a JSON validator. It checks whether your syntax is correct and helps to localize errors. Sometimes you can use the same tool to view and validate JSON.

Writing JSON

Any Python data structure can be saved as JSON.


In [57]:
import json

list1 = ["Monday",  6, "pumpkin", 3.1415]
dict1 = {
           "latitude_degree": 43.6617, 
           "latitude_direction": "N",
           "longitude_degree": 79.3950, 
           "longitude_direction": "W"
        }


json_encoded = json.dumps(dict1)

print json_encoded


{"latitude_direction": "N", "longitude_degree": 79.395, "latitude_degree": 43.6617, "longitude_direction": "W"}

Reading JSON

There are Python functions for encoding and decoding JSON into data structures.


In [59]:
import json

with open("cake.json", "r") as file_reader:
    file_contents = file_reader.read()

    
json_contents = json.loads(file_contents)

# print json_contents
print json.dumps(json_contents, indent=1)


[
 {
  "customer": {
   "last": "Ball", 
   "first": "Maldonado"
  }, 
  "rating": 2, 
  "specialOrder": true, 
  "tags": [
   "ullamco", 
   "aute", 
   "mollit", 
   "ex"
  ], 
  "greeting": "Happy Birthday", 
  "filling": "chocolate", 
  "batter": "vanilla", 
  "frosting": "chocolate"
 }, 
 {
  "customer": {
   "last": "Gallegos", 
   "first": "Priscilla"
  }, 
  "rating": 4, 
  "specialOrder": true, 
  "tags": [
   "occaecat", 
   "ullamco"
  ], 
  "greeting": "Happy Halloween", 
  "filling": "cream cheese", 
  "batter": "vanilla", 
  "frosting": "cream cheese", 
  "_id": "544dbb15755bad877f163ce6"
 }, 
 {
  "customer": {
   "last": "Dunlap", 
   "first": "Lydia"
  }, 
  "rating": 2, 
  "specialOrder": false, 
  "tags": [
   "reprehenderit", 
   "sunt", 
   "officia", 
   "velit"
  ], 
  "greeting": "Congratulations", 
  "filling": "vanilla", 
  "batter": "chocolate", 
  "frosting": "vanilla", 
  "_id": "544dbb15f27aff34b75532f4"
 }, 
 {
  "customer": {
   "last": "Barron", 
   "first": "Adeline"
  }, 
  "rating": 0, 
  "specialOrder": true, 
  "tags": [
   "consectetur", 
   "consectetur"
  ], 
  "greeting": "Happy Birthday", 
  "filling": "cream cheese", 
  "batter": "vanilla", 
  "frosting": "chocolate", 
  "_id": "544dbb1544b3ca9c14246ff6"
 }, 
 {
  "customer": {
   "last": "Meyers", 
   "first": "Conley"
  }, 
  "rating": 4, 
  "specialOrder": false, 
  "tags": [
   "cillum", 
   "ea", 
   "duis"
  ], 
  "greeting": "Congratulations", 
  "filling": "custard", 
  "batter": "vanilla", 
  "frosting": "vanilla", 
  "_id": "544dbb15ac1eba4a44347dcf"
 }, 
 {
  "customer": {
   "last": "Armstrong", 
   "first": "Peterson"
  }, 
  "rating": 1, 
  "specialOrder": true, 
  "tags": [
   "adipisicing", 
   "sit", 
   "mollit", 
   "anim"
  ], 
  "greeting": "Congratulations", 
  "filling": "cream cheese", 
  "batter": "carrot", 
  "frosting": "chocolate", 
  "_id": "544dbb15a350974cd50f19e7"
 }
]

json_contents is typically list, a dictionary, or a combination. You already know how to use these. You can use a for to iterate over all the elements in a list. You can use dictionary syntax to access key value pairs.


In [71]:
for order in json_contents:
    print("Customer: " + order['customer']['last'])

    for word in order['tags']:
        print("\t" + word)


Customer: Ball
	ullamco
	aute
	mollit
	ex
Customer: Gallegos
	occaecat
	ullamco
Customer: Dunlap
	reprehenderit
	sunt
	officia
	velit
Customer: Barron
	consectetur
	consectetur
Customer: Meyers
	cillum
	ea
	duis
Customer: Armstrong
	adipisicing
	sit
	mollit
	anim

Regular Expressions

Regular expressions, called regexes for short, are descriptions for a pattern of text. These are useful for validating inputs and formatting.

For example, a \d in a regex stands for a digit character—that is, any single numeral 0 to 9. The regex \d\d\d-\d\d\d-\d\d\d\d is used by Python to match a telephone number as a string of three numbers, a hyphen, three more numbers, another hyphen, and four numbers. Any other string would not match the \d\d\d-\d\d\d-\d\d\d\d regex.

But regular expressions can be much more sophisticated. For example, adding a 3 in curly brackets ({3}) after a pattern is like saying, “Match this pattern three times.” So the slightly shorter regex \d{3}-\d{3}-\d{4} also matches the correct phone number format.

Resources

Creating Regex Objects


In [72]:
import re

phone_regex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')

Matching Regex Objects


In [81]:
phone_match = phone_regex.search('My number is 647-970-9425.')

if phone_match is None:
    print ("No match found")
else:
    print('Phone number found: ' + phone_match.group())


Phone number found: 647-970-9425
  • Search moves through the string from start to end, stopping at the first match found
  • All of the pattern must match

Grouping with Parentheses

  • Parentheses are used to groups parts of expression
  • Groups can be retrieve parts of the match

In [88]:
phone_regex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
phone_match = phone_regex.search('My number is 415-555-4242.')

print phone_match.group()
print phone_match.group(0)
print phone_match.group(1)
print phone_match.group(2)


415-555-4242
415-555-4242
415
555-4242

In [87]:
print phone_match.groups()


('415', '555-4242')

In [90]:
area_code, main_number = phone_match.groups()

print area_code
print main_number


415
555-4242

Special Operators

  • ^ = start, $ = end -- match the start or end of the string
  • \ -- inhibit the "specialness" of a character. So, for example, use . to match a period or \ to match a slash. If you are unsure if a character has special meaning, such as '@', you can put a slash in front of it, \@, to make sure it is treated just as a character.
  • | (pipe, on slash key) --alternation, for combinining patterns
  • ? --optional matching

In [91]:
cheese_regex = re.compile(r'^Cheese')
cheese_match = cheese_regex.search('Cheese Shop sketch')
print cheese_match.group()

cheese_match = cheese_regex.search('Not much of a Cheese Shop really, is it?')
cheese_match == None


Cheese
Out[91]:
True

In [97]:
monty_regex = re.compile(r"Terry|Michael|Graham|John|Eric")
monty_match = monty_regex.search("Eric Palin, John Cleese, and Eric Idle")

print monty_match.group(), "at" , monty_match.start()

monty_match = monty_regex.findall("Eric Palin, John Cleese, and Eric Idle")
print monty_match


Eric at 0
['Eric', 'John', 'Eric']

In [100]:
parrot_regex = re.compile(r'Parrot(man|mobile|copter|bat)')
parrot_match = parrot_regex.search('Parrotcopter lost a blade')
parrot_match.group()


Out[100]:
'Parrotcopter'

In [101]:
parrot_regex = re.compile(r'Parrot(wo)?man')
parrot_match = parrot_regex.search('The Adventures of Parrotman')
print parrot_match.group()

parrot_match = parrot_regex.search('The Adventures of Parrotwoman')
print parrot_match.group()


Parrotman
Parrotwoman

Wildcard Characters

  • . (a period) -- matches any single character except newline '\n'
  • * (asterisk) -- match zero or more, or optional
  • + -- match one or more

In [102]:
parrot_regex = re.compile(r'Parrot(wo)*man')

parrot_match = parrot_regex.search('The Adventures of Parrotman')
print parrot_matches.group()

parrot_match = parrot_regex.search('The Adventures of Parrotwowowoman')
print parrot_match.group()


Parrotmobile
Parrotwowowoman

In [103]:
parrot_regex = re.compile(r'Parrot(wo)+man')

parrot_match = parrot_regex.search('The Adventures of Parrotwoman')
print parrot_match.group()

parrot_match = parrot_regex.search('The Adventures of Parrotman')
parrot_match == None


Parrotwoman
Out[103]:
True

In [104]:
# Match everything with .*

name_regex = re.compile(r'First Name: (.*) Last Name: (.*)')
name_match = name_regex.search("First Name: Tarquin Last Name: Fin-tim-lim-bim-whin-bim-lim-bus-stop-F'tang-F'tang-Ole-Biscuitbarrel")
name_match.groups()


Out[104]:
('Tarquin',
 "Fin-tim-lim-bim-whin-bim-lim-bus-stop-F'tang-F'tang-Ole-Biscuitbarrel")

Matching Specific Repetitions

Another way of writing (Ha)(Ha)(Ha) is (Ha){3}


In [107]:
ha_regex = re.compile(r'(Ha){3}')
ha_match = ha_regex.search('HaHaHa')
ha_match.group()


Out[107]:
'HaHaHa'

(Ha){3} means exactly 3 repetitions (Ha){,5} means up to 5 repetitions (Ha){3,5} means 3 to 5 repetitions

  • Python regular matching is greedy by default.
    • It takes the first match and makes it as long as possible.
    • In ambiguous situations, the longest string possible is matched.
  • To suppress this use a ? (question mark) after the pattern

In [108]:
greedy_regex = re.compile(r'(Ha){3,5}')
greedy_match = greedy_regex.search('HaHaHaHaHa')
print greedy_match.group()

nongreedy_regex = re.compile(r'(Ha){3,5}?')
nongreedy_match = nongreedy_regex.search('HaHaHaHaHa')
print nongreedy_match.group()


HaHaHaHaHa
HaHaHa

Character Classes

Another way of writing (0|1|2|3|4|5|6|7|8|9) is \d, which is a shorthand character class.


In [109]:
gbs_regex = re.compile(r'\d+\s\w+')
gbs_regex.findall("12 apostles, 11 went straight to heaven, \\
                 10 commandments, 9 bright eyed shiners, \\
                 8 Gabriel angels")


Out[109]:
['12 apostles', '11 went', '10 commandments', '9 bright', '8 Gabriel']

Custom Character Classes

  • Use square brackets
  • Use a caret (^) to negate
    • It's on the 6 key

In [110]:
vowel_regex = re.compile(r'[aeiouAEIOU]')
vowel_regex.findall('This is a dead parrot. DEAD.')


Out[110]:
['i', 'i', 'a', 'e', 'a', 'a', 'o', 'E', 'A']

Compilation Flags

  • DOTALL, S --Make . match any character, including newlines
  • IGNORECASE, I --Do case-insensitive matches
  • LOCALE, L --Do a locale-aware match
  • MULTILINE, M --Multi-line matching, affecting ^ and $
  • VERBOSE, X --Enable verbose REs, which can be organized more cleanly and understandably.
  • UNICODE, U --Makes several escapes like \w, \b, \s and \d dependent on the Unicode character database.

In [111]:
vowel_regex = re.compile(r'[aeiou]', re.I)
vowel_regex.findall('This is a dead parrot. DEAD.')


Out[111]:
['i', 'i', 'a', 'e', 'a', 'a', 'o', 'E', 'A']

In [ ]:
# Exercise
# Joke: What do you call a pig with three eyes? Piiig!

Why don't we use regex instead of string matching all the time?

  • It's more computationally expensive
  • If you're doing something simple, stick with strings
  • If you can't do it any other way, use regex

The Answer Game

Two roles: person asking the question (Q) and the person answering the question (A)

  • Q takes question to A
  • If A can answer it, A keeps the piece of paper
  • Q takes next question to another A, repeat until all questions are answered

If you ask a question 3 times and can't get an answer, bring it to me.


In [ ]: